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8.3 DATA VS. INFORMATION; A SYSTEM PARADIGM 
Fred C. Billingsley 

Jet Propulsion Laboratory) California Institute of Technology 
Pasadena) California 9110^ 

This is a paper about thinking. 

It may not seem to be, but il is. 

As such) it won't give any answers. 

But it should give you some ideas. 

In the justification stage of any new system, the proponent is usually asked 
to provide some sort of Benefit/Cost analysis. Because, for a data system, 
there is no value in the data per se, justification must be found in the 
benefits in the use of the data. If, as is usually the case, the instrument 
or system designer is not a "user**, his recourse is to survey the user 
community to obtain some sort of consensus on the utility. The generally 
unsatisfactory nature of the resuls is reflected in the large number of times 
the users are surveyed, resurveyed, and re~r esurveyed. Something must be 
missing) or the answers would have been found. 

The thrust here is not the justification of the system itself, but rather the 
justification of the selection of the various technical parameters which the 
system must meet. This justification (i.e., optimum parameter tradeoff) must 
be done in relation to the ability of the user to turn the cold, impersonal 
data into a live, personal decision or piece of information. 

Therein, of course, lies the sleeper: the data system designer requiifs data 

parameters, and is dependent on the user to convert his information needs to 
these data parameters. This conversion will be done with more or less 
accuracy, beginning a chain of inaccuracies which propogate through the 
system, and which, in the end, may prevent the user from converting the data 
which he receives into the information he requires. The concept to be pursued 
will be that errors will o<'^»*r in various parts of the system, and, having 
occurred, will prooogate to the end. Modeling of the system may allow an 
estimation of the effects at any point and the final accumulated effect, and 
may provide a method of allocating an error budget among the system components. 

Inaccuracies will be considered to be of two types, which may be stated in 
terms of transfer functions for each of the system components considered: 1) 

Calibration — the difference between the stated transfer functicn and reality; 
2) Uncertainty — the error bars around each stated function and measurement. 


This paper presents the results of one phase of research performed by the Jet 
Propulsion Laboratory, California Institute of Technology, sponsored by the 
National Aeronautics and Space Administration under Contract imAS7-100. 
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We begin by modeling an information system as shovn in Figure 1. The forward 
model is required to convert units of information to units of required data* 
and answers the question "What set of measurements will best cai ry (i.e*, 
allow the best derivation of) the information?" This box provides the set of 
measurement "requirements" to the measuring system, which responds with a szt 
of real measurements which will hopefully be somewhat near to the desired set. 
However, the data system may have an inaccurate or uncertain transfer 
function, so that the set of apparent measurements presented to the 
information model deviate further from reality. It is with this set that the 
user attempts to derive his information, using the Information Model. 

Evaluation of the system takes place in two levels as suggested in Figure 2. 
Note that the evaluation (and, therefore, the design) of a total information 
system is the joint responsibility of the user and the data system designer, 
as model boxes under the cognizance of each are involved. The data system 
designer cannot be held for the inadequacies/uncertainties in either the 
forward or the information models, although he is deeply interested in the 
validity of each. 

Care must be taken in designing the models, and the systems which they 
represent. Figure 3 applies to both models and systems. At the low end of 
complexly, the system may only provide a nominal solution to the information 
problem (^, and so the potential errors due to the design may be quite large. 
But at least, the data can be obtained. At the other extreui*-, a complex 

model(^ can produce the desired results quite precisely, if only the data 
required for the^olution could be obtained (c^. If it can be identified, the 
saddle point, is the optimum complexity to design to. In the case of 

registration of Landcat, for example, the saddle point may be found to be at 
the 0.5-1. 5 pixel level, fairly broad, with the moderate gains obtained with 
very complex processing being very costly or the requisite complex data (e.g., 
world-wide GCPs) being unobtainable, or at the low complexity end, simple 
processing producing only moderate registration accuracy. 

SYSTEM DESIv>.i 

We will be concerned primarily with the data system design. This includes the 
choice of the parameters (e.g., spectral bands, resolution, etc.), the 
exactness with which they must be maintained, the calibration process 
including the availabilty of required ancillary data, data latency, and the 
uncertainties associated with each of these items. This must be done in the 
context of the complete information system. The data system block is 
diagrammed in Figure 4. 

Two approaches may be taken to the data system design: 1) Optimize the data 

system by minimizing the summation of the deviations of the delivered products 
from the desired measurements; 2) Optimize the total iuformation system by 
minimizing the decre-^ses in obtainable information (by the users) due to 
deviations in the desired measurements from the requested set. One of these 
approaches is used implicitly, if not explicitly, in any system design. They 
do not necessarily lead to the same choice of parameters. 
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T':U8, the data system design model for 1) is diagrammed Figure Tt can 
be seen that this is a linear programming problem. The loss function to be 
minimized is the (weighted^ according to the importance of the various 
disciplines) sum of the deviations in the data delivered from each oxscipline 
request. The parameters available to the designer are the sensor bogie 
parameters^ anticipated interference factors (such as sensor v orawious, 
ground altitude relief displacements » orbit uncertainties, etc.), the ability 
to measure these, the calibration forward model (i.e., how do we plan to 
remove the errors?), data system procedures, availability/accu acy of 
calibration references, and the procedures used to rectify (apply the 
calibrations)* To properly choose between the parameters, coefficients 
pertaining to the sensitivity of results to variations in each parameter and 
to the importance of the various parameters are required. (For example, how 
important to Discipline A is the difference between prompt registration to 1 
pixel vs. delayed registration to 0.3 pixel; how impoitan*^ are these relative 
to overlay matching or to absolute geodetic location, and how imoortant is 
Discipline A in the total scheme of things?) These coefficients, if available 
at all, will gen rally be only poorly known. Note, however, that ii they are 
not ev^plicitly stated, they will be assumed by the data system designer with 
01 without affirmation by the discipline users. A caveat to the users! 

In approach 2), the user and his forward and information models are explicitly 
treated, as demonstrated in Figure 6. In this case, ^he information system 
design must take into account the effect of the real data on the information 
conversion in the information models, recognizing that ic will be different 
from the desired data and will be accompanied by the accutaulated 
uncertainties. In addition to the set of data system parameters, available 
also are potential changes in the forward and information models (e.g., the 
user may have to do things differently than first planned if the anticipated 
real data is too divergent from the data desired or if it will be accompanied 
by too large errors.) In an informaton-driven system, the information losses 
allowed will place tolerances on the real data. This requires that the 
information model be accompanied with a ensitivity analysis. The 
information/forward model linear programming optimization -How the user 

to trade off the various desired parameters requested, aliow^.ng for the 
anticipation of data realities and the influences of the other disciplines on 
the total information system outcome. 

Again, a cav.-at to the users — this procedure, usually implicit, require » the 
choosing of sensitivity coef f icients, also usually implicit. The user with 
particularly sensitive requirements hud best make his needs known! 

Just as the various errors propagate to the end ("downstream**), in an 
information-driven system the tolerances will propagate upstream. The 
implication is that, in contrast to the normal sirgle- thread system which 
requires ever-tighter tolerances in the earlier stages, it may be possible for 
certain users to pick off data earlier in the data strrrtui before errors have 
had a chance to accumulate, aud for them to do their own processing. This may 
relieve error tolerances on che remainder of the system. 
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In addition to the sensitivity coefficients for a single parameter, cross 
coefficients may be important in analyzing the tradeoffs. Four examples of 
interdependency of variables are sketched in Figure 7. For example, with 
Variable A being spatial resolution and Variable B being data rate: Case I 

(upper left), a user may have lots of computer capability, so that data 
quantity is no problem, but increasing resolution improves things up to a 
point after which (say) scatter in the data decreases his performance. In 
Case IV (lower right) a smaller user finds the same type of resolution 
optimization, but total data quantity hurts, so that at some point the 
iuc^*ease of data with resolution becomes the limiting factor. 

As a second example, let the variables be ability to register (in pixels) vs, 
the pixel size, and let us consider three cases: 1) user doesn't care about 

registration at all, because he is only looking at a single image; 2) desired 
geodetic location (in, say, meters) is constant regardless of resolution 
because the user must register to GCP at the same location accuracy 
independent of resolution; this user requires that the per pixel registration 
get better as the pixels get larger; 3) for overlay purposes, the same 
fractional pixel accuracy is required regardless of the pixel size. These are 
sketched in Figure 8, together with a hypothetical system peformance. The 
heavy line in Figure 8 indicates the ridge of optimum performance from the 
user point of view. The intersection of the anticipated system performance 
with the user ridge indicates the design optimum. 

It is realized that the bogie parameters, sensitivity coefficients, and cross- 
sensitivity coefficients required to do a quantitative system optimization 
will generally not be available. Nevertheless, these are implicitly defined 
in the system designer's mind. He will make a mental evaluation of the user 
forward aad information models, and try to decide which parameters are 
important and which can be slighted, and then proceed to the data system 
design. Much fundamental research remains to be done to define the forward 
and information user models and to obtain the sensitivity factors, to allow 
these to Le used in a quantitative total information systi^m design. 

SYSTEM ANALYSIS 

Somehow the system design is arrived at, the sensor built and data delivered. 
The user must then work with whatever data is now available, together with its 
errors. At that point he has only the information model to vary — that is, he 
will do whatever necessary to derive the desired information. We will leave 
him his troubles, and consider the data system itself. The task at this 
point is to evaluate the system performance (see Figure 9), 

The normal desire in designing a system is that each function be a 1:1 
translation, with a change in dimensions only, until finally the "correct 
measurement'* is the same as the "desired measurement". We will therefore 
model each function ac "somewhat linear, with a bias" (figure left), but let 
the output be produced with some uncertainty (figurj right). For a linear 
system with th^ input having any possible value with equal probabilty, the 
probability that the output has a value in a certain range is found by 
convolving the probability density function of the error with that of the 
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signal^ and integrating between limits representing the range of interest. 
(Note that the probability distribution function of the error is equivalent, 
in one dimension, to the point spread function of the image case. The symbol 
h will therefore be used.) Define the "gain” of each stage as Output/Input = 
a, so that for a two stage system. 



>'2 = 








For constant total syscem gain *2^2, minimum error occurs when a] » a?. This 
leads to the engineers* old rule of thumb: put as much gain aheaa of any 

noise sources as possible. 

If the error sources are Gaussian, the convolutions become root-mean-square 
additions* However, in the registration case, it is not yet clear whether the 
error sources are Gaussian, so the RMS addition must be used wih caution. It 
is also not clear whether an RMS statement of the errors is the one most 
useful to the user in evaluating the system performance. For example, it nay 
be more important for the user to know where the displacement errors occur 
(worse in areas of high relief and predictable in direction) than it is tor 
him to know an RMS value (which in itself may be suspect). 

It should be noted that a statement of the errors occurring in various parts 
of the system is of marginal use by itself unless, perhaps, one or more is 
glaringly bad. Not until the system model is built (implicitly or explicitly) 
can the error propagation be estimated. During system design, the propagation 
estimate is used to establish tolerances on the components, and during evaluat 
it will be used with the actual expected errors to check performance and to 
identify critical error contributors. After the contributing error sources 
and their interactions are identified, the following questions may be asked of 
each source and of the system as h whole: 

* What IS the intended component performance? 

^ What is the component actual expected performance? 

* How may the performance be verified? 

* What correction methods are available for system use? 

* What correction methods are available for user use? 

* How well can the correction methods potentially work? 

* How well may the correction methods actually work? 
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How do inaccuracies propagate through the system? 

How do uncertainties propagate through the syscem? 
Where are the major inaccuracy or uncertainty sources? 


Finally, it is to be expected that there may be breakdowns during operation, 
or that there may be operational problems in performing the component 
functions* Considering probability of correct operation as a system 
criterion, the following questions are pertinent: 

* Vhat slack is there in the design to allow for problems? 

* If a problem occurs, will the system fail catastrophically or 

gracefully? 

* What is the probability that the system will remain up (within specs) 
for XI of the time? 

* !Iow hard does the system .seem to be to operate? 

* What potential for operator errors are present? 

* Are work-arounds for various envisioned errors defined? 

* How friendly are the system interfaces to the users? 

* Where are the operational bottlenecks? 

* Are there any serious single-point failure points? 

FINAL POINTS 


It can be seen that the various ’'User Requirements Surveys” have not asked the 
right questions, or at least have not asked the questions within a milieu to 
allow the user to respond with the coefficients required by the system 
designer* The most recent system survey by GSFC has taken a step in the right 
direction by presenting to the users several potential systems among which the 
users were to indicate the relative usefulness* But the necessary grossness 
of the differences prevents any fine tuning of the parameters. 

It is not clear that this fine tuning is even possible, given the diversity of 
users within each discipline, let alone among the disciplines* No plateaus 
of, say, registration accuracy, have been found beyond which there is a marked 
loss of utility of the data. The loss of utility with poorer performance has 
i;ot, perhaps cannot, be stated for the various disciplines. And the 
sggcegation of the losses will produce a loss curve with a gradual slope, with 
no cliffs. 


In the long run, it may well be found that all of the potentially obtainable 
information is already in the user surveys which are available and that users 
really cannot define their coefficients, much less anticipate' the ^'oef f icients 
of others. This is the ”lcw complexity” end of the spectrum. In this case, 
tb advances in system performance will be more technology driven, and the 
users must make of it what they will. (In any event, once a system is 
designed, this is the situation.) It then remains to the system personnel to 
define what data quality results at various points in the system, and 
hopefully lo allow the users to obtain data ol various quality to suit their 
needs. The system error model will be used to do the evaluation, identify and 
remove successive error predominant sources, to provide ever-better data to 
the users, and to serve as a source of information for subsequent systems. 
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Figure 3. Model and System Complexity Optimization 
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BY VARIOUS DISCI PUNES 


Figure 4. Data System Block Diagram 



Figure 5. Design Model for Data System Optimization 
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VARIABLE A 

A HAS A MAX. NO PREFERRED VALUE FOR 
B. MAX OF A DOES NOT DEPEND ON B 



A AND 6 EACH HAVE A MAX. BUT THERE 
IS NO DEPENDENCE OF ONE ON THE 
OTHER. EACH MAY BE OPTIMIZED SEP- 
ARATELY. 


VARIABLE A 
A HAS A MAX WHICH DEPENDS ON THE 



THE MAX OF A AND THE MAX OF B ARE 
INTERRELATED WITH ONE BROAD CO-MAX 
POINT 


Figure V. Interaction of Sensitivity Coefficients 




Figure 8. System Operating l\)int Selection 
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